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ASYMPTOTIC GLOBAL ROBUSTNESS IN BAYESIAN DECISION 

THEORY 

By Christophe Abraham and Benoit Cadre 

ENSAM-INRA and Universite Montpellier II 

In Bayesian decision theory, it is known that robustness with re- 
spect to the loss and the prior can be improved by adding new obser- 
vations. In this article we study the rate of robustness improvement 
with respect to the number of observations n. Three usual measures 
of posterior global robustness are considered: the (range of the) Bayes 
actions set derived from a class of loss functions, the maximum regret 
of using a particular loss when the subjective loss belongs to a given 
class and the range of the posterior expected loss when the loss func- 
tion ranges over a class. We show that the rate of convergence of the 
first measure of robustness is y/n, while it is n for the other measures 
under reasonable assumptions on the class of loss functions. We begin 
with the study of two particular cases to illustrate our results. 

1. Introduction. In Bayesian analysis, choosing a prior distribution and 
choosing a loss function according to prior knowledge and preferences are 
difficult tasks. In practice, the decision maker usually chooses convenient 
approximations to the subjective prior and the subjective loss. The legiti- 
macy of such approximations might be investigated by a sensitivity analysis 
of the results with respect to the approximations. This is the purpose of 
robust Bayesian analysis, which recently was overviewed by Ri'os Insua and 
Ruggeri (2000). An interesting approach, called global robustness, proposes 
to replace a single prior distribution (resp. loss function) by a class of priors 
(resp. loss functions) and then to compute the range of the ensuing answers 
as the prior (resp. loss function) varies over the class. 

Bayesians mainly focus on sensitivity to the prior distribution, although 
the final result can be drastically affected by the loss function. Moreover, Ru- 
bin (1987) showed that the loss function and the prior cannot be separated 
under a weak system of axioms for rational behavior. It is worth pointing 
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out that robustness with respect to the prior can be expressed as a par- 
ticular case of loss robustness. This is illustrated by the following example: 
the computation of the range of the posterior expectation when the prior 
density p ranges over a class T reduces to the computation of the range of 
the Bayes actions (i.e., decisions that minimize the posterior expected loss) 
when the loss function ranges over the class {hp/po, pGT}, where I2 is the 
quadratic loss and po is a fixed prior. 

When robustness is lacking, Abraham (2001) showed it can be improved 
by adding new observations. It is of practical interest to know how many 
new observations are needed to achieve a given robustness. Herein we answer 
this question by investigating the asymptotic rate of convergence of three 
measures of posterior robustness. Because of the above remark, we focus on 
robustness with respect to the loss, since it provides a general framework 
including many prior robustness problems. 

The asymptotic of global robustness measures (e.g., the range of posterior 
means or set probabilities) with respect to the prior has been investigated for 
particular classes (mainly e-contamination classes) by Sivaganesan (1988), 
Pericchi and Walley (1991), Moreno and Pericchi (1993) and Ruggeri and 
Sivaganesan (2000). The local point of view has been studied by Gustafson 
and Wasserman (1995), Gustafson, Srinivasan and Wasserman (1996) and 
Sivaganesan (1996). For a recent account of the theory, refer to Sivaganesan 
(2000). 

In Sections 4-6, we proceed with the study of three measures of posterior 
global robustness. Section 4 is devoted to the study of the Bayes actions set 
derived from a class of loss functions. We show that the Bayes actions set 
tends to a limit set with rate y/n, where n is the number of observations. In 
Section 5, we are concerned with the regret of choosing a decision associated 
with a particular loss function when the true loss function varies over a given 
class. We show that the rate of convergence of the supremum of the regrets 
is yjn or n, according to the class of loss functions. Section 6 deals with the 
range of the posterior expected loss, which has asymptotic rate ^fn or n as 
well. Section 2 provides two examples. For one of them, the above asymptotic 
rates are actually achieved for every finite n. In Section 3 we set up notation 
and terminology. In particular, we indicate that the posterior distribution 
can be calculated under misspecified models, that is, we contemplate that 
the observations are realizations from a convenient probability distribution 
with density h a (a is the parameter), while the true distribution Q may 
not correspond to h a for all values of a. Finally, we compile some auxiliary 
results in Section 8. 

2. Examples. In this section, we present two examples based on tractable 
classes of loss functions. Such classes have already been considered in Martin, 
Rios Insua and Ruggeri (1998) and Abraham and Daures ((1999, 2000)). 
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2.1. Squared- error loss. Whereas squared-error loss is frequently used to 
approximate nearly symmetric loss functions [Berger (1985)], it is of practical 
interest to investigate robustness with respect to variations around this loss. 
It is also of theoretical interest because it makes the calculations relatively 
simple. 

The set of parameters and the set D of decisions are both assumed to 
be M. Fix < k\ < k 2 , depending on the incomplete information on the true 
loss, and define [/:9xD-> M. + as 

(2.1) U(a, d) = (k 2 {d >a} + h{d < a})l (a, d), 

where {C} denotes the usual indicator function of C and lo(o~, d) = 0.5(d — a) 2 
denotes the convenient loss chosen by the decision maker. Define L by inter- 
changing k\ and k 2 in the definition of U . Let Dq\1 stand for the derivative 
of I : G x T> — > R + with respect to d and introduce the class T of loss functions 
/ : xP-t R + such that for all a E 0, DqiI(o~, •) is continuous, Z(<7, a) = 
and D 01 L<D 01 l<D 01 U. 

Assume that X±, . . . , X n are independent and identically distributed from 
a normal N^i,^ 1 ) distribution, where the variance A" 1 is known. Take 
a N(fj,o, Aq 1 ) prior. The posterior n n is then normal N(fi n , X^ 1 ) with /i n = 
(Ao^o + A(Xi + • — h X n ))/X n and precision A n = Ao + nX. Denoting, for all 
I € df as a minimizer of P(-) = J & l(o~, -)7r n (<i<7), elementary calculations 
show that U n and L n admit only one minimizer given by 

du = + n/v / A^ and d n L = n n + r 2 /\/~K, 

where r 2 < < r\ are constants depending on k\ and k 2 . 

Let us now investigate the computation of the three measures of posterior 
robustness. Since, by Abraham and Daures (1999), {df,l € J 7 } = [dy,^], 
the diameter of {df,l G J 7 } is equal to (n — r^/^/X^,, which gives the first 
measure of robustness. Write now reg"(d) = l n {d) — m.ix>l n for the posterior 
regret. By the definition of J-, if d 2 >d\, we have for all I G J-, 

l n {d 2 )-l T \d 1 )= D 01 l n (t)dt= [ D 01 l(a,t)7r n (da)dt. 
Jdi Jd,! Je 

Hence, we deduce that 

(2.2) supregP(d) = max{reg^(d),reg2(d)}. 

Let d$ = n n be the Bayes rule associated with the squared-error loss func- 
tion Iq. After some calculations, we obtain that for some constants c\ and 

C2, 

U n (d%)-U n (d u ) = c 1 /X n and L n (d%) - L n {d n L ) = c 2 / X n , 
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and hence 

supreg™((io) = max(ci, c 2 )/A n , 

which gives the second measure of robustness. Finally, if S = k 2 lo an d / = 
k\lo, then I,S £ J- and I <l < S for all I £ T. Then if we write ran n (d) = 
supi G jrl n (d) — m£i<=Fl n (d) for the range of the posterior expected loss, we 
obviously have 

ran n (d%) = S n (d%)- I n (d%) 

= 0.5(k 2 -h)/X n , 

hence the third measure of robustness. 

We emphasize that the constants r\, r 2 , c\ and c 2 can be numerically 
computed and that similar calculations can be done with different functions 
U, L and Iq. As a conclusion, we proved that, for the class the speed 
of convergence of the diameter of {df,l £ J 7 } is y/n, while the speed of 
convergence of the posterior regret and the range of the posterior expected 
loss are n. 

2.2. The dam construction problem. Following Ulmo and Bernier (1973), 
the economical consequence of constructing a dam d meters high is the sum 
of the cost construction and the cost due to a potential flood, 10<i + 100(H — 
d) {H > d}, where H is the peak water level. Note that the consequence is a 
random variable. Assuming that H is exponentially distributed with density 
h a (x) = ae~ ax and taking the expectation yields the loss 

l ((T, d) = lOd + IOOcj- 1 exp(-dcr). 

A similarly constructed utility function can be found in Berger [(1985), page 
58]. The loss Iq can be viewed as a convenient approximation to the true loss. 
Let us proceed similarly to Section 2.1 to study the robustness of the Bayes 
action. Consider the class T of functions I such that Dq±L < Dq\1 < Dq\U . 
Whereas the minimum of lo(o~, ■) is obtained when da = log 10, we define 

U(a, d) = ($(dcr - log 10) + 0.5) l (a, d) 

and 

L(cr, d) = (1.5 -&(da- log 10)) l (a,d), 

where $ denotes the cumulative distribution function of N(0, 1). Let df and 
do be generic notation for the Bayes actions associated with the loss func- 
tions I and Zo, respectively. It can be proved that U(a, •) and L(cr, •) are con- 
vex functions with a unique minimizer. Thus, the set of Bayes actions is still 
[dy, dj] and the largest posterior regret can be calculated by (2.2). The pos- 
terior distribution is derived from n independent observations with density 
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h a and a reference prior it (a) = a 1 (7r n ~ Gamma(n, Ya=i -^i))* We sim u- 
lated n = 100 observations with respect to /iq.5 and computed numerically 



El=iXi = 193.6, d% = 2.7, d n L = 7.7, d% = 4.5 and sup^regf (dg) = 19.5. 



Thus, the optimal dam size is somewhere between 2.7 and 7.7 m, and using 
the optimal decision associated with 1$ gives an excess posterior loss less 
than 19.5. Can we get more precise results by adding new observations? 
Sections 4 and 5 answer in the negative. Indeed, Theorem 4.1 applied to 
C = {U,L} shows that the range of the optimal sizes approaches d e L — dfj 
with rate \fri, where 9 is the true value of the parameter a, and d L and 
dfj are the minimizers of U(9, •) and L(9, •). From the data we can guess 9 
to be about 0.5 (because 1/x = 0.51) and deduce that d e L — dfj is around 5 
by numerical computation of d e L and dfj for 9 = 0.5. Since d\ — dy = 5, we 
cannot expect to improve the result. Note that the class T is large since, 
even when 9 is given, it is only known that the optimal size is somewhere 
between dfj and d e L . Also note that if we had chosen a class J- such that 
dfj = d L , the range of the optimal sizes could have been arbitrarily reduced 
by adding observations [see Abraham (2001) for a description of the limit 
of the Bayes actions set]. Similarly, we know from Theorem 5.1 that the 
largest posterior regret approaches maxlreg^dg), reg^(dg)}, which remains 
about 20, where d$ denotes the minimizer of Iq(9,-). 

3. Preliminaries and notation. 

3.1. The model. Let X = (X\,X2, ■ ■ ■) be a sample sequence of indepen- 
dent and identically distributed random variables defined on some measur- 
able space (Xq, Bo), where Bo denotes the Borel cr-field of Xq. In the sequel Q 
refers to the joint distribution on (X, B) of the sequence X, where X = Xq 
and B denotes the Borel cr-field of X. 

We introduce the family of probability densities {h a , a 6 6} with respect 
to some cr-finite measure [i on (Ao,£>o), where the parameter space is M fc 
with Borel cr-field Bq. Note that the model may be misspecified since we 
do not assume that Q corresponds to any of the densities h a . For technical 
reasons, we make the additional assumption that (cr, xq) — > h a (xo) is Bq ® £>o 
measurable. 

From now on, we fix a prior distribution ir on (0,^e). The existence 
of the posterior distribution for misspecified models was studied by Berk 
(1970). For simplicity, we assume that the posterior distribution 7r n , defined 
for all A G Be by 




does exist Q-almost surely. 
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We assume the model h a to be regular enough so that the maximum like- 
lihood estimate 6 n is asymptotically normal [i.e., for some #6 0, \/n(6 n — 6) 
converges in distribution to a normal random variable Zg] and the poste- 
rior distribution concentrates around the true value of the parameter as 
n — > oo. The precise assumptions M on the model are given in the begin- 
ning of Section 8. Sufficient conditions for the existence and the asymptotic 
normality of 9 n (i.e., assumption Ml) with misspecified models were given 
by White (1982) for the case when G is compact. Moreover, Abraham and 
Cadre (2002) studied the concentration of 7r n around the true value of the 
parameter; see also Strasser (1976) when the model is correctly specified. 
More precisely, both works give sufficient conditions so that M2-M4 hold. 

3.2. The basic class of loss functions. For simplicity, let T> = MP be the 
decision space. In the sequel a loss function is defined to be a function 
/ • © x T> — > M + such that l(-,d) is measurable for each al £T> and l(o~,-) is 
twice continuously differentiable for each a € 0. 

If Oi (resp. di) denotes the ith component of ff G 9 (resp. d£V),we write, 
when they exist, 



where i and j stand for the row index and the column index, respectively. 

In this article a class C of loss functions is said to be locally 7r-dominated 
if, for all d 6 D, there exist a function g £ L\{ti) which is bounded on a 
neighborhood of 6, and an open ball B(d, r) with center d and radius r > 
such that 



with the notation DqqI = I. Here and in the sequel ||a|| denotes the maximum 
of the absolute values of the coordinates of a vector or a matrix a with real 
coefficients. Thus, a locally 7r-dominated class is also locally 7r n -dominated 
on the event {/ g{a)TT n {da) < oo}, the probability of which tends to 1 when 
n — > oo by Lemma 8.1. Since this article deals with convergence in probability 
and in distribution, we may restrict our attention to the elements of this set. 




sup sup \\D(yyl{a, t) || < g(a) 
lec t&B{d,r) 



a£ 6, 7 = 0,1,2 
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To shorten notation, we write l n {d) = jQl(a,d)TT n (da) as the expectation 
of l(-,d) with respect to 7r n . Note that in a locally 7r-dominated class differ- 
entiation and integration can be inverted, and we let 

D 07 l n (d)= [ D 0l l(a,d)7r n (da), 7 = 1,2. 

Furthermore, if D(y,l(a,-) is continuous for 7r-almost all a, Dq^V 1 is contin- 
uous as well. 

3.3. The Bayes action process. Since, for each loss function l n (d) is 
a measurable function of x and a continuous function of d, it is possible, for 
each x G X such that argmin^g© l n (d) 7^ 0, to select a minimizing decision 
df(x) in such a manner that the function x 1— ► df(x) is B measurable [Rock- 
afellar and West (1998), Theorem 14.37]. The decision df is called the Bayes 
action associated with the loss I. 

We use the outer probability theory to avoid strong assumptions on C that 
ensure the measurability of (df)i e c- We denote by Q* the outer probability 

derived from Q, by Y n ^Y the convergence in outer probability and by 
Y n ~» Y the weak convergence (with respect to Q*) of Y n to Y . For more 
details about outer probability, refer to van der Vaart and Wellner (1996). 

Throughout this article C denotes a locally 7r-dominated class of loss 
functions such that the outer probability that argmin<2gx> l n (d) = for some 
I G C is zero. We then define a Bayes actions process to be a family (df)i e c 
of minimizing decisions. We equip the space of functions from C into the 
space of matrices with real coefficients with the supremum norm. 

4. Asymptotic of the Bayes actions process. This section is devoted to 
the study of the Bayes actions process. To get asymptotic results, it is nec- 
essary to put some restrictions on C. We assume throughout that C satisfies 
the following properties [recall that 9 is fixed (see Section 3.1)]: 

la. For every I G £, argminZ(0, •) = {df}. 

lb. There exists a neighborhood Vg of 9 such that, for all / G C, Doil(-,df) 

is continuously differentiable on Vg. 
lc. sup 26 £||Dn/(0, df)]] < 00, supi eC \\Do2l{9, df )|| < 00 and inf; € £| detDo2Z(6*, 

d?)|>0. 

Id. The families {D u l(-, df )\ Vg , I G £}, {D 02 l(-,df)\v g , I G £} and {l(;d)\ Vg , 
I G C, d G K} are equicontinuous at 9 for any compact K C V. 

Let B(c, r) be generic notation for an open ball with center c and radius r > 
0. 

le. For every 77 > 0, there exists p v G Li(tt) with sup CTg y Pr]{o~) — > v -^o and 
such that for all a G O we have 

sup sup \\D 02 l(o-,d) - D 02 l(a,df)\\< p v (a). 
ie£ d€B(df,ri) 
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If. There exist r > and a compact set K <ZT> such that 
sup inf 1(6, d) < inf inf inf l(a,d). 

lg. For every 77 > 0, 

K(n) = inf inf [Z(0,d) - Z(0,d?)l > 0. 
le£deB<=(df,ri) 

The homogeneity of £ is ensured by conditions lb-le. From If we prove that 
the Bayes actions remain in a compact set (Lemma 8.2). Let us illustrate 
the assumptions by the following examples. 

Example 4.1 (Prior robustness). Let T be a class of densities with 
respect to (w.r.t.) the Lebesgue measure m on R and assume tt has a 
positive density wq w.r.t. m. Consider the class C of functions l{a,d) = 
(d — a(a)) 2 w(a)/wo(a) with w £ T. For instance, we take a(a) = a or a(a) = 
{a £ S} whether we are interested in the posterior expectation or the pos- 
terior probability of a set S. For simplicity, let us choose a{a) = a. Assume 
that wq and each w € T are continuously differentiable on a neighborhood Vg 
of 6. If furthermore sup„, 6r sup CT6 y 9 w(a) < oo, swp wer sup CTg y fl |w'(cr)| < oo 
and inf^gp inf^gVe w(a) > 0, assumptions la-lg are fulfilled. 

Classes as in Example 4.1 include density band classes, mixture classes 
and e-contamination classes with adequate conditions. [Conditions on the 
e-contamination class are those used by Sivaganesan (1996).] 

Example 4.2. Consider the case 9 = V = E. Assume that J e \a\ p x 
w(da) < oo and let g:M— > [0, oo) be a polynomial of degree p. Consider 
the class Q of three times differentiable non-negative functions / such that 
|/( 3 )(i)| <g(t). Assume further that / is decreasing on (— oo,0] and in- 
creasing on [0, oo) with a unique minimizer at and that there exists 
M > such that supj g:F /(0) < oo, sup^ g _^/(0) < inf/g^infi^j^ f(t) and 
< infj g jr/"(0) < supy g jr/"(0) < oo. Then the class C of loss functions 
l(a,d) = f(d — a), f &G, satisfies every assumption of Section 3.2 and la-lg 
of Section 4. 

This example includes, for instance, parametric classes (with Linex losses) 
and e-contamination classes with adequate conditions [for definitions and 
examples of classes of loss functions, refer to Ri'os Insua and Ruggeri (2000)]. 

To shorten notation, we write ip(l) instead of [Do2l(6, d\ )\~ x D\\l(6 , d\ ). 

Theorem 4.1. Under the assumptions M: 
(i) ^sup ig£ || (dp - d?) + tp(l)(6 n - 0)|| So. 
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(ii) ^(df-df)iec-^me) le c. 

(iii) V™sup ig£ ||dp-df|| ~^sup ig£ \\(p(l)Z g \\. 

From a robust point of view it is of interest to know the rate of convergence 
of the Bayes actions set with respect to the Hausdorff metric h. Let A = 
{(if, / G £} and A n = {df,l G £}. Recall that h(A n ,A) < 5 if and only if 
every point in A is within distance 5 of at least one point in A n and vice 
versa. Thus, h(A,A n ) < sup; g £ \\df — df\\ and, by Theorem 4.1, 

y/^/u n h(A,A n )Zo 

for any sequence of positive numbers such that u n — > oo, thus improving the 
main result in Abraham (2001). Clearly, the same result holds if h(A,A n ) is 
replaced by (diameter A n — diameter A). Assuming moreover that T> = = K 
and df = d e is independent of I G C, we get from Theorem 4.1, 

y/n diameter A n sup(ip(l)Zg) — mf(ip(l)Zg). 

Example 4.1 (continued). Assume that for some w G T with J a 2 x 
w{a) da < oo we have w <w for all w; G T. The class £ is then 7r-dominated. 
Write l n (d) = J B (d — a) 2 w n (da), where w n is the posterior distribution de- 
rived from the prior density w, and denote by A n the set of posterior 
expectations. Since ip(l) = — 1 and A n = A n , we deduce from above that 
y/n diameter A n 0. 

Example 4.2 (continued). Since cp(l) = —1, we have y/n diameter A n 

0. 

Proof of Theorem 4.1. Recall that integration and differentiation 
can be interchanged in a locally 7r-dominated class. By definition of df, 
= D 01 l n (df). For s G [0, 1] write tf s = df + s(df - d\ ). By Taylor's formula 
we have 



= yfHD Q1 l n {4) +yfti [ D Q2 l n {tf s )\df - df) d S 

Jo 

= ^(AnHdf) - r>nKMf)(0n - 9)) 



+ 



r D 02 i n (tfjds 

Jo 



y/^(df - df) + D U l(6, 4)yfc(d n ~ 9) 



= a n {l) + A n (l)V^(d? - df) - R n {l) 

with evident definitions of a n (l), A n (l) and R n (l)- By Theorem 8.1 the 
supremum when / ranges over C of a n (l) tends to in outer probability. 
Then (i) is straightforward from Lemmas 8.4 and 8.6. By Slutsky's lemma 



10 



C. ABRAHAM AND B. CADRE 



and Ml, (i) gives (ii). Taking into account the continuity of the application 
z — ► sup; G £ ||^(0II) where z is a function from C to M fc , we easily deduce (hi) 
from (ii). □ 

5. Posterior regret. Let Iq G C. From now on we think of 1$ as a conve- 
nient approximation of the true loss. For simplicity of notation we write d^ 
and cIq instead of df and df . We let Sq C C be a class which satisfies the 
following conditions (recall that Vq and p n were defined by lb and le): 

2a. For every I S Sq, Z ( • , cZq ) is continuously differentiable on Vq. 
2b. For every i] > and a E 0, we have 

sup sup ||D iZ(cr,d) - D il(a,d e )\\ < p v (v)- 

2c. The families {DqiI(-, do)|vs > ^ G <So} and {DioZ(-, c?q ) I v e > ' G 5o} are equicon- 
tinuous at 9. 

2d. sup Ze5o \\Doil(0,d^)\\ < co and swp leSo \\D 10 l(6,d e )\\ < oo. 

Similarly, the class S C £ is defined by replacing d^ by d\ and 5o by S in 
conditions 2a-2d. In the remainder of this section we restrict our attention 
to a class of loss functions £iC5fl5o. 

For every I S C\ and every d E 2?, write 

regP(d) = P(d) - inf Z n (d) and regf (d) = Z(0, d) - inf Z(0, d). 

del? dGl? 

This section is devoted to the study of the posterior regret process for the 
decision dfj associated with the convenient loss Iq . This measure of robustness 
was used by Berger (1984). 

Theorem 5.1. Under the assumptions M, 
^(regrK)-reg?(d*)) ie£i 

- ([-D 01 l(9, dg)V(Z) + £>ioZ(0, do)* " AoW ^ f\Z 9 ) leCl . 

Taking into account the continuity of the application z — > sup igiCl 
defined on the functions from C\ to M. k , we deduce from Theorem 5.1 the 
asymptotic bound for every 

IimsupQ*( Vnsup|regJ*(d5) -regf(dg)| > u J < Q[ sup |M,| > 

where (M/)^^ is the limit process that appears in Theorem 5.1. The above 
inequality provides information on the value of n that we need to obtain 
an arbitrarily robust analysis. For instance, choose a arbitrarily small and 
u G R so that the right-hand term is less than a. Then with probability 
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greater than 1 — a, the posterior regret regp(do) associated with any loss 
function I G C\ is less than li/i/n + sup ig £ 1 regf (dg) for large n. 

PROOF of Theorem 5.1. By Proposition 8.1 we have 

sup \i n (df) - i(e, 4) - d 10 i(9, 4)\e n - e)\ % o 

led 

and 

Vnsuv\l n (d%)- 1(0,4) 
led 

- D 0l i(e, - 4) - D 10 i(6, 4)\e n -e)\%v. 

The conclusion easily follows from Theorem 4.1 and Slutsky's lemma. □ 

From a practical point of view, it is of interest to consider the particular 
case where the optimal decision df is actually independent of I, as is the case 
in estimation problems. If we assume moreover that Iq is such that d^ = df, 
then by Theorem 5.1, 

x/nsup regP(dft)~»0. 
leCi 

In this situation, we can expect to obtain a better rate of convergence. As 
a matter of fact, it turns out that the rate of convergence of the posterior 
regret is of order n. 

Theorem 5.2. Assume that d$ = d® for every I G C\. Then under the 
assumptions M ; 

n sup reg?(dg) - \ sup^M/o) - ^(l)) 1 D Q2 l(0 , df)(ip(l ) - <p(l))Z e ]. 
leCi led 

The theorem gains in interest if we consider the special case where T> = 0, 
and Iq and every I G C\ are functions of d — a, which is a very common 
situation in estimation problems. In this case, (p(l) = —I p , where I p is the 
p x p identity matrix and 

nsup regP(<#) ~» 0. 
led 

It is easy to check that every assumption of this section is satisfied by the 
class of Example 4.2. Thus, the result above also holds for this class. 

Example 4.1 (continued). The assumptions of Section 5 are fulfilled 
with lo(a,d) = (d — a) 2 . Define p(w,n) such that l n (d) =p(w,n)l n (d) and 
assume that sup wer p(w, n) remains bounded in Q* probability [this holds, 
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e.g., if there exists w such that w >w for all w £ V and if w and wq satisfy 
the conditions of Strasser (1976) or Abraham and Cadre (2002)]. We de- 
duce from the above remark that nsup we -p(J (d® — o~) 2 w n (do~) — V{w n )) ~» 0, 
where dg and V(w n ) are, respectively, the posterior expectation derived from 
the prior Wq and the posterior variance derived from the prior w. 

Proof of Theorem 5.2. Since D il n (df) = 0, we have, by Taylor's 
formula, 

regf(^) = l n (d n )-l n (d?) 

= s)(d% - dfYD 02 l n (df - s(d% - <%))(<% - df) ds. 

Jo 

However, by Theorem 4.1 and Lemma 8.4, 

sup sup \\D 02 l n (df-s(dS-df))-D 02 l(9,df)\\^0. 
leCi sg[o,i] 

Moreover, we easily get by Theorem 4.1 that 
Hence 

n sup|regr(^) - - d^fD^O, df)(d£ - d?)| ^0. 

Ze£i 

We conclude by using again the asymptotic behavior of V^(do — dp)jg£ 1 . 
□ 

6. Range of the posterior expected loss. The beginning of this section 
is devoted to the study of the range of the posterior expected loss, 

(6.1) ranged) = supT(d) - inf l n (d), 

zeSo leSo 

where d£T> and So is defined in Section 5. 

Theorem 6.1. Assume that d$ = d\ and 1(9, d e Q ) = l'(6,do) for every I 
and I' eSq. Then, under the assumptions M ; 

vWn^K) sup[D 10 /(Mo)%] " inf [D 10 l(6, 4)%]. 
leS /e5 o 

Proof. Since Doil(9,do) = 0, Proposition 8.1 shows that 

vWp \l n (d%) - 1(9, d d Q ) - D 10 l(9, dl)\9 n -9)\%Q. 
ies 
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This gives (v^ n K) -Z(Mo)))*eSo - (AoiM)%)ieSb according to 
Theorem 4.1, but, by assumption, 

rang K) = sup[P«) - 1(9,4)] ~ /*| W) - 1(0 1 4)1 

so that the conclusion follows from a continuity argument as in the proof of 
Theorem 4.1 (Hi). □ 

It is worth pointing out that ran§ (d) = S n (d) — I n (d) when there exist 
I and 5 in So such that sup ie(So I = S and inf; g< s 1 = I. Because of the above 
remark, let us define another class of loss functions which is well adapted 
to the study of the range of posterior expected loss. Let I £ So and S S So, 
and define [I, S] to be the class of loss functions l:6xD-> M + such that 
I < I < S. Such a class was considered in Abraham (2001). The important 
point to note here is that regularity assumptions are only required on /, 
S and Zo- Thus, this class includes very irregular losses as soon as they 
are bounded by / and S. This is very attractive from a practical point of 
view since Iq can be regarded as a tractable approximation of the true loss, 
the accuracy of which is now given by I and S. It is also of computational 
interest because it involves only two loss functions. For simplicity of notation, 
we write iaxi 1 f s (d) instead of ranj^ s Ad), where the previous expression is 
defined by replacing So by [I,S] in (6.1). Similarly, we write 

ran e IS (d)= sup 1(6, d) - inf 1(9, d). 
i e[I> gl le[I,S] 

Theorem 6.2. Under the assumptions M, 
v^(ran^«)-ran? 5 «)) 

~» [[D W (S - /)(Mo)F " [Doi(S - I)(0,4)f<p(h)\Ze. 

Proof. Since S € So, Proposition 8.1 yields 

Vn~\S n (d%) - S(9, 4) - D 01 S(9, d d Y(d^ - d e Q ) - D 10 S(9, 4)*(0 n -9)\%0. 

The same result holds with S replaced by /. Theorem 6.2 is then an imme- 
diate consequence of Theorem 4.1 and assumption M2, since by assumption 

ran? 5 K) - ran? 5 (dg) = [S n (d n ) - S(9, d e )] + [1(9, d e ) - P(d^]. □ 
Observe that if S, I and lo are functions of d — a, Theorem 6.2 reduces 

to 

V^(ran? s (d5)-ranf s (dg))-»0. 
In this case we can improve the rate of convergence. 
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Theorem 6.3. Assume that I(-,cIq) and S{-,cLq) are twice continuously 
differentiable, D w I(6,d e ) = D w S(6,d e ) and D il(6,d e ) = D 01 S(6,d e ). Then 
under the assumptions M, 

n(ran? s (dff) - ran? s (dg)) - \[Zl(N s - Nj)Z e + L s - Lj], 

where 

N s = <p(l ) t D 2S(e, d e Ml ) + D 20 S(9, d e ) - 2D 11 S(6, dg)V(lo). 

Nj is defined by replacing S by I in the above formula, and the constants 
Ls and Lj are defined in Section 8 by replacing f by S(-,do) and /(-,dg), 
respectively, in (8.6). Furthermore, if D iS(8, d e ) = D\qS{0, d e ) = 0, then 

(6.2) n(S n (d n v) - S(9,d e )) - ±[Z^V 5 Z, + L<j]. 

T/ie same result holds if S is replaced by I in (6.2) under the assumptions 
D iI(6,d d ) = D 10 I(e,d e ) = 0. 

Consider again the usual case where lo, S and I may be expressed as 
functions of d — a. Then we have tp(lo) = —Ip, DqiS = D20S = —D\\S and 
finally N$ = Ni = 0, so that, by Theorem 6.3, 

n(ran^K) - ran? s (dg)) ~> \{L S - Lj). 

Example 4.1 (continued). Take wi and ws in T and consider the den- 
sity ratio class r" = {w € Li(m) :wi < w < ws}- If p(ws, n) < ^wq(9)/w,s(0) 
[which holds under the conditions of Strasser (1976) or Abraham and Cadre 
(2002)], it can be proved from (6.2) that 



nsup / (d^-a) 2 w n (da)(lg [ T 2 F e (dT)) 
wer'Je \ Jo J 

remains asymptotically in the interval [l,ws{0)/wi( 



Proof of Theorem 6.3. Write A = S - I. Let us first examine the 
convergence of the sequence n(A n (d'Q) — A(#,Oq)). By Taylor's formula, 

A»(dfr)-A"(dg) 

= D 01 A n (4Y(d n -4) 

+ f\l - s)(d% - d e ) t D 02 A n (d e + s(d% - d d ))(d% - 4) ds 
Jo 

= A + B, 

where A and B are obviously defined. Theorems 4.1 and 8.1 show that 

n \A - (o n - efD u A(e, d d Y(d^ - d 6 Q )\%0. 



ASYMPTOTIC GLOBAL ROBUSTNESS 



15 



Moreover, by Lemma 8.4 and Theorem 4.1, we have 

n\B - - d e o ) t D O2 A(0, 4){d n Q - 4)\ 3o. 
Finally, since JDio A(0, dg) = 0, Theor em 8.2 shows that 

n|A n K) - A(0,dg) - \{0 n - e) t D 20 A(8,d e )(9 n -6)- ±L A | ^0. 
Therefore, it follows from Theorem 4.1 that 

n|A n (e$) - A(0,dg) - \[{9 n - efN A {0 n - 0) + L A ]\^O. 

The second part of Theorem 6.3 is obtained by replacing A by S and /, 
respectively, in the above calculations. □ 

7. Discussion. We give in this article sufficient conditions to get optimal 
rates of convergence. Let us investigate whether they are necessary. We 
mainly discuss the existence of the second d derivative. 

Consider the class T of Section 2.1 and define a new class T by replacing 
U and L, respectively, by U (<r, d) = f(d — a) and L(cr, d) = f{a — d) in the 
construction of where f(t) = e~ l + t — 1. Note that the quadratic loss 
Iq defined in Section 2.1 belongs to T. From the arguments of Section 2.1, 
the diameter of {df, I £ is equal to the diameter of {c^,cf~}. Thus, from 

Section 4 (Example 4.2 applied to C = {U, L}), yfn diameter { df, I G J 7 } 
while -y/ndiameterjfi™, I £ J 7 } n — r 2 > 0. The difference in the limit indi- 
cates different rates of convergence, which are due to the fact that Dq2U(6, 6) 
does not exist while Dq2U{o,9) ~ Dq2U(6,6) for a close to 6. From a tech- 
nical point of view the term Do2l n (tf s ), defined in the proof of Theorem 4.1, 
no longer converges to DQ2l(6,df) when I = U, but switches from k\ and &2 
according to the sign of df — 9 even for large n. Consequently, it is no longer 
possible to derive in this way the limit of ^/n{dfj — 0) and Theorem 4.1 does 
not hold for C = {L, U}. [A theoretical asymptotic study of such classes can 
be found in Abraham (2002).] The default of smoothness [i.e., Dq2U(6,6) 
does not exist] slows down the rate of convergence. Analogous situations 
have already been noted in prior robustness: classes with point mass priors 
have slower rates of convergence [Sivaganesan (1988)]. 

8. Auxiliary assumptions and results. 

8.1. The assumptions M. 

Ml. There exist 6 E G and a matrix Ig such that y/n(0 n — 9) converges in 
distribution to a centered normal random variable Zq with covariance 
matrix Ig. 
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M2. For every g G L\{~k) and a > 0, there exists 77 > such that 

e vn / g(a)7r n (da) — > in Q probability. 

J\\a-6\\>a 

Write for all k > 0, 

W% = {*ee:\\T{a)\\<Vklogn}, 
where T(a) = \fnln (a — 9 n ). Let F n be the probability distribution in- 



duced by T applied to 7r n and let be the closed ball with center 9 and 
radius \Jk\ogn. 

M3. For all r > 0, there exist A: > and c > such that 

Q(Tr n (e\W^)>cn- r )^0. 
M4. There exist a probability distribution with zero mean Fg such that 

g(a)F n (da) / g(a)F e (da) 

in Q probability, for all 5 : — ► M with |g(o")| < c(l + ||c|| 2 ) for some 
c> and all <7 G 0. 

8.2. Asymptotics for the posterior expectation. Throughout this section, 
we denote by Gf{a) the gradient at a G of a function / : — > R. 

8.2.1. First order result. We denote by a set of functions / : — * M 
with the following properties: 

Al. For allfeVe, f{9) = 0. 

A2. There exists an open neighborhood VI of 9 on which any / G Vg is 

continuously differentiable and supj- £ p g \\Gf(6)\\ < 00. 
A3. The family {Gf\v>, f G Vg) is equicontinuous at 9. 
A4. There exist a 7r-integrable function q : — > M and <5o > such that 

sup |/(<r)| < g(cr) V<tG0 and sup g(<r) < 00. 

feT g lk-e||<<5o 

Theorem 8.1. Under the assumptions M, 



n sup 



f(a)7r n (da)-Gf(9) t (9 n -9) 

e 



•0. 



Proof. We proceed analogously to the proof of Theorem 1 of Strasser 
(1975). We separate the proof into two steps. 
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Step 1. Let us prove that for every c > 0, there exists k > such that 
Q*(V^sup / |/(a)|7r n (d(7)> C N )^0. 

— 1/2 

Let i = inf 110-11=1 \\I g cr\\ and 5 = i5q, where Sq is the real number of A4. 
Clearly, we have i > and hence 5 > 0. Moreover, we also have, by A4, 

a := sup q(a) < sup g(<r) < oo. 

||7^ 1/2 ((7-6»)||<5 lk-0||<5 o 

Fix c > 0. By A4, we have, for all k > 0, 

v 7 ^ sup / \f(a)\ir n (da) > c => y/n q(a)ir n (da) > c, 

and if the latter property holds, then 

\\I 1/2 (e n -d)\\>5/2 or 

(8.1) 

(V^ / g(a)7r n ((ia)> C , ||/ fl " V2 (0*-0)||< A 
V Je\w£ 2/ 

The probability of the event associated with the first property tends to 
by Ml. We now focus on the second property. Let us denote by £ the subset 
of 9 defined as 

£ = {aee:\\I~ 1/2 {a-9)\\ <5}. 

There exists N > 1 such that if ||/^ 1/2 ((9 n - 0)|| < 5/2, then for all n>N, 
W% C £. Thus, if the second property in (8.1) holds, 

Vn q(a)7r n (da) > J) or q(a)TT n (da) > W* C £ J . 

Using the obvious notation, let A and £> be the events associated with the 
above properties. On one hand, the probability of A tends to by M2. On 
the other hand, 

B C {a^ir n (Q \ W*) > c/2} 
and, for some k > 0, the probability of the latter event tends to by M3. 

Step 2. Let us prove that for all k,c> 0, 

f{a)TT n {da)-Gf{6)\e n -e) : 



Q* ( \fn sup 

feP e 



We obviously have, for all / G Ve, 

(8.2) / f(a)n n (da)= [ f(T-\r))F n (dr), 
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where T and B% are denned in Section 3 [recall that T 1 (r) = 6 n + n x I 2 I X J 2 t\ . 
If T~ 1 (t) G Vg, then there exists A G ]0, 1[ such that, according to Al, 

(8.3) f(T~\r)) = Gf(e + \u(T)Mr), 

where u(t) = 9 n — 9 + n _1//2 /^ 2 r. Let us denote by H the property 

VreB^ T-^eVj and (9 + An(r) G V^'. 

It is easy to check that there exist s > and N > 1 such that, for all n > N, 
\\9 n — 9\\ < s ==>■ H. Then, if the property 



n sup 



f(a)n n (da)-Gf(9) t (9 n -9) 



> c 



holds, we have \\9 n — 9\\ > s or 



(8.4) 



n sup 



f(a)Tt n (da)-Gf(9f(0 n -e) 



>c,H). 



By Ml we need only to focus on the latter property. If iif holds, we have, 
according to (8.2) and (8.3), 



sup 

f€V e 



f(a)Tr n (da)-Gf(ey(e n -e) 



1.5) 



= sup 
< sup 



Gf(6 + A W (r))'n(r)F n (dr) - Gf(9f(9 n - 8) 
(Gf(9 + Xu(r)) - Gf{9)) t u{r)F n {dr 



+ sup\Gf(8) t (9 n -9)(F n (B*)-l) 



+ sup n 

fev g 



-1/2 



Gf{8) t l] 12 f rF n (dr] 



Let 7 > 0. By A3 there exists (3 > such that, for all a G V e with ||cr — 6>|| < /?, 

sup ||G/(a)-G/(0)||< 7 . 

Let a = supj g p fl ||G/(0)||, which is finite by A2. For all n > N, if the property 
in (8.4) holds, we have, by (8.5), 

(\\9 n -9\\+n- 1 ^I 1 e /2 \\Vk^>l3), 
VE\\9 n - 9\\ + 7 ||/ e 1/2 || ||r||F n (<*r) > |), 
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a 



y/n\\e n -e\\ \F n {B*)-\\ > 
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or 



cell J! 



1/2, 



TF n (dr] 



> 



Since Fg is centered, J Bk TF n (dr) — ► in probability by M4. Hence the proba- 
bility of the event associated with the latter property vanishes. The probabil- 
ities of the events related with the other properties tend to by M2 and M4, 
for some choice 7. Step 2 is then proved and the theorem is a straightforward 
consequence of Steps 1 and 2. □ 

8.2.2. Second order result. Throughout this section we denote by Hf{o~) 
the Hessian matrix at a £ of a function / : — ► R that satisfies the fol- 
lowing properties: 

Bl. There exists an open neighborhood Vg of on which / is twice con- 
tinuously differentiable. 
B2. f(6) = and Gf(6)=0. 
B3. / is 7r-integrable. 

We introduce the notation 
(8.6) 



Lf= /(/ e 1/2 r)^/(^)(/ e 1/2 r)F e (dr), 



provided such a quantity may be defined. Note that Fg is normal under 
usual models [Strasser (1976)]. 



Theorem 8.2. Under the assumptions M, 

[ f(a)n n (da) -\{6 n - 0yHf(e){0 n -0)-\Lj 

J 



n 



in probability. 

Proof. Following the arguments of the first step of the proof of Theo- 
rem 8.1, we can prove that for all c> there exists k > such that 



Q(n [ \f(a)\ir n (do-)>c) -0. 
V Je\w% / 

Hence, we need only to prove that for all k > 0, 

n f f(o-)ir n (da) - \{9 n - d y Hf(e)(6 n -9)- \L f 

in probability. We use the notation of the proof of Theorem 8.1. According 
to B2, if r _1 (r) G V g " , then there exists A e]0, 1[ such that 



1.7) 



f(T- l (r)) = i«(r)*fr/(0 + \u{t))u{t). 
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Fix k > and denote by H' the property 

Vr6 5j T-VjeVk" and + An(r) G Vg'. 

For some s > and N > 1, we have ||0 n - 6>|| < s fl 7 for all n > iV. If fl 7 
holds, then according to (8.2) and (8.7), 

0) ~ \L t 



( f( a )Tx n {da)-\{0 n - efHfieWn -I 

/ u{T)\Hf{e + \u(t)) - Hf(9))u{r)F n {dr) 



1 

< - 

~ 2 



(8.8) 



+ -\\e n -ef\\Hf(e)\\\Fn(B k n )-i\ 



+ \\Hf{e)\\\\0 n -e\\\\t e 



1/2, 



rF n (dr; 



1 

+ 2n 



f (ll /2 ryHf(e)(ll /2 r)F n (dr)-L f 



Let 7 > 0. According to Bl, there exists > such that if a € Vq with 

lk-0||<A 



||i^/(cT)-^/(^)||< T . 



Fix c > and let 



L»= [ (ll /2 ryHf(9)(ll /2 T)F n (dT). 
We deduce from (8.8) that if we have 

n I f(a)n n (da) - \{6 n - 9) t Hf{9){9 n -9)- \h f 



> c, 



then for all n> N, 



3 n -9\\>s) or (^\9 n -9\\ + ^\\ll /2 \\^/k]^i>l3^ 
^J B J u{ r )fFn{ dr ) >l), 



(8.9) 



\\9 n -9f\\Hf(9)\\\Fn(B k n )-l\> 



V^\\Hf(9)\\\\9 r 



rV2, 



TF n {dr) 



>-| or 
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According to M2 and M4, the sequence (n J B k \\u{t)\\ 2 F n (dr)) n is stochasti- 
cally bounded and hence, for some 7, we have 

Q^lJ Bk \\u(T)fF n (dr)>^i^Q. 

Moreover, the probability of the events associated with the other properties 
of (8.9) obviously vanishes according to M2 and M4. □ 



8.3. Technical results for the classes C, S and Sq. 



Lemma 8.1. Let g be a ir -integrable and nonnegative real-valued function 
such that there exists a bounded neighborhood of on which g is bounded. 
Then under the assumptions M, 



J^9(o-)^n{dcr) < 00J -> 1. 



Proof. Denote by B the bounded neighborhood of 0. For t > 1 let 
fn(t) = QU Bc g{a)K n {da) > t). By M2 we have 

sup|/ n (i)|<Q( / g(a)7T n (da)>l) - 0. 
t>i \Jb° J 

Furthermore, lim t ^ 00 / n (t) exists since /„ is decreasing and bounded, so 
that linin^oo lim^oo f n (t) = lim^oo lim^oo f n (t) = 0. We conclude by not- 
ing that 

Q\ I g(a)ir n (da) = 00 ) < lim Q[ / g(a)Tr n (da) >t) + lim f n (t), 
\Je J t/oo \Jb J t/00 

hence the lemma, since g is bounded on B. □ 

Lemma 8.2. Under the assumptions M, there exists a compact set K C 
V such that Q*{3l€ C, df G K c ) -> 0. 

Proof. Take r > and K compact as in If and introduce a and < 
e < 1 such that 

sup inf 1(0, d) < (1 — e)a < a < inf inf inf l(a, d). 
iec d( ^ K izCo£B(e,r)deKc 

Then, if d G K c , we have 



l n {d) = / l(a,d)iT n (da)+ / l(a, d)-a n {da) 

JB{B,r) JB c (e,r) 

>air n (B(0,r)). 
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Thus 



31 £ C,d[ l £ K 



3l££,3d£K c , l n (d)< inf l n (t) 

teK 

31 eC, aTT n (B(6,r))< inf l n (t) 

teK 



31 e£, air n (B(9,r))< Ml(9,t) + e- 

t£K 2 

3 1 e A inf l n (t) > inf 1(0, t) + e-) 

tG-ff teK 2 J 



a 



or 



+ 7r n (B(^r)) <supinfZ(0,t) or 



supsup |f(t) - Z(0,i)| >e 
lec teK 2 



n n (B c (e,r))>-) or 

/ supsup |Z(<7, i) — l(9,t)\ir n (da) > e 
J lec teK 



By M2, Q(Tr n (B c (6,r)) > e/2) -> 0. Moreover, if the last condition on the 
right-hand side holds, then for all p > 0, 



/ supsup \l(a,t) — l(6,t)\7r n (da) > e 
JB(8,p) ieC teK 



or 



f a 
/ supsup \l(a,t) — l(9,t)\n n (a) > e— . 

JB c (e, P ) leCteK 4 

By Id we choose small enough so that the outer probability of the event 
associated with the first property tends to 0. Then, for the second property, 
bound the integrand by g\ £ -^i(vr) and conclude by the concentration as- 
sumption M2. Since C is 7r-dominated, the existence of gi is deduced from 
the compactness of K. □ 

Lemma 8.3. Under the assumptions M, 

sup ||cf -df||^0. 
lec 



Proof. According to Lemma 8.2, we may restrict our attention to those 
x £ {x £ X, VZ G C, df(x) £ K}, where K is a compact set. By If there is no 
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loss of generality in assuming that d® G K for I G C. Let e > 0. Note that, 
for / G £ and d G B c (d 6 l ,e), the property / n (cf) < / n (df) implies that 

- z(0, d) < -(*(*, d) - M 4)) + {i n (4) - K0, 4)) 

<-K(e) + (n4)-m4)), 
where the last inequality follows from lg. According to the above remark, 
we have, for all r > 0, 

sup \\df — 4 1| > £ 
lec 

=> 3 / G £, 3deB c (d e l ,e)r\K, l n (d) < l n (df) 

sup sup |Z n (d)-Z(0,d)| >^p) or 

BU P |i»(df)-Z(e,df)|>^ 

supsup |P(d)-Z(M)l>^ 
sup sup sup \l(a,d) — l(9,d)\ 

leC dGKcr£B{8,r) 



+ / sup sup \l(a, d) — 1(9, d)\n n (da) > 

JB c (9.r) IGC dGK 



«(e) 

jr jr | - \- 7 "/ - \ - 1 ~- / I ■ • i * \ / — o 

'B c (9,r) leC d€K * 

By Id, we can choose r > such that 

k(s) 

supsup sup \l(a, d) — 1(9, d)\ < — - — 

leC deK aeB{8,r) 4 

and we thus get 

sup \\df — 4 II > £ => / supsup | l(a,d) -I (9, d) \ 7r n (da) > — j^. 

Taking into account the compactness of K, we can deduce from the definition 
of a locally 7r-dominated class that there exists g± G Li(tv) such that 

sup sup \l(a, d) — 1(9, d)\ < g\(a) Vc G 0. 
le£ deK 

The conclusion then follows from M2 and lg. □ 

Lemma 8.4. For every n > 1, sG [0,1] and I G C, let tf s :X^V be a 

map such that sup^ g £sup s6 r ji \\tf s — df\\ ^>0. Then, under the assumptions 
M, 

sup sup \\D 02 l n (tf >s )-D 02 l(e,df)\\%0. 
let se[o,i] 
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Proof. Fix e > 0. By le take rj > such that sup CTg y p v (o~) < e/2. Then 

sup sup \\D 02 l n (tf tS )-D 02 l n (df)\\>B 
leC sg[o,i] 



sup sup ||ij? - df || > rj) 
lec se[o,i] 



or 



:[0v 

The outer probability of the events associated with the above properties 
tends to by assumption and M2. Consequently, it remains to prove that 

sup sup \\D 02 l n (d1) - D 02 l(9,df)\\%0. 
lec se[o,i] 

By Id take (3 > such that 

sup sup \\D Q2 l(a,df)-D 02 l(e,dt)\\<^. 

leC aeB(6,/3) 1 
Then by splitting the integral according to O = B(9,f3) U B(9,f3) c , we have 



sup 



e 



(D 02 l(a,df) - D 02 l(9,df))ir n (da] 



> e 



sup(||A)2Z(Mf)|| + \\D 02 l(e,df)\\)Tr n (da) > £ -. 

B{B,dY l£C * 

Taking into account that C is locally 7r-dominated, the outer probability of 
the event associated with the above property tends to by M2. □ 

Following the arguments of the proof of Lemma 8.4, we obtain the result 
below. 



Lemma 8.5. For every n > 1, s £ [0, 1] and I £ So, let tf s :X^T> be a 

M, 



map such that sup; g5(J sup s6 r 0) i] \\tf s — d^\\ ^>0. Then, under the assumptions 



sup sup \\D Q1 l n (tf s )-D 01 l(9,d 9 )\\^0. 
leS os e[o,i] 

The result is still true if d® and Sq are replaced by df and S, respectively, in 
which case Dq\1{9 ,d d ) =0. 

Proposition 8.1. Under the assumptions M, 
nsup\l n {d^)-l{9,d e ) 

- D 0l l(9, - 4) - D 10 l(9, d e )\9 n - 0)| ^0. 
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The result is still true if ' d®, and 5 « r e replaced by d\, df and S, respec- 
tively, in which case Dq\1(9 ,d®) = 0. 

PROOF. Let leS . Then 

z»(dB) - i(e, 4) = (p(dS) - i n (4)) + (i»(dg) - ip, 4)). 

By Taylor's formula, the first term on the right-hand side equals 

f 1 D QX r(4 + 8W-4))\<%-4)da, 

Jo 

so that, by Lemma 8.5 and Theorem 4.1, 

V^sup - l n (4) - D 0l l(9, 4) t (<% -d 9 Q )\% 0. 

ZeSo 

Moreover, by Theorem 8.1, 

vW P \i n (4) - i(e, 4) - d w i(o, 4)\e n -9)\^o, 

l&So 

which proves the proposition. □ 

8.4. Technical result related to weak convergence. Let F(C) be the set 
of mappings from C into K and let J\A.i j(F(C)) be the set of i x j matrices 
with coefficient in F(C). For A G Mi,j(F(C)), write H^Hoo = sup Ze£ ||^4(t) || . 
The proof of the following lemma is left to the reader. 

LEMMA 8.6. For all n > 1, consider the maps M n :X — ► M p> i(F(C)), 
A n :X -> M PtP (F(C)) and R n :X ->M p ,i(F(C)). Let A e M P , P (F(C)) such 

that inf; g £ | det.A(Z)| > and \\A\\ < oo. Assume that A n ^A, R n R, where 

R.X — ► M. Pi \(F(£j) is B orel measurable and \\A n M n — i? n ||oo ^> 0. Then we 

have \\M n - A- x R n \\ 00 %Q. 
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